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(54) Design for scalable network management systems 



(57) A system and methodology for building highly 
scalable Network Management Systems (NMS)s forthe 
management of large voice and data networks is pro- 
vided. The NMS of the present invention is designed to 
manage a plurality of network elements communicating 
through network controlled finks. The elements are 
combined into management groups of no more than n 
elements. A first functional subsystem controls and 
manages the communication functions associated with 
a first group of elements. Each function or process 
inside the subsystem is guaranteed to communicate 
with only one instance of every other type of process or 
function. As additional network elements are added to 
the system, additional management groups are also 
added, one functional subsystem for every manage- 
ment group of elements. The plurality of other functional 
subsystems are substantially the same as the first func- 
tional subsystem. Limiting the management process to 
intra-subgroup communications greatly simplifies 
expansion of the system. When a total of m manage- 
ment groups exist, at least (m - 1) functional subsys- 
tems are replicated from the first NMS subsystem. 
Thus, the network is scaled up by using size-limited 
subsystems. 
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Description 

Background of the Invention 

[0001] This invention relates generally to communi- 
cation network management and, more particularly, to a 
system and method of scaling the management func- 
tions in an expanding communications network by repli- 
cating functionally complete subsystems of a fixed 
maximum size. The simple replication process permits 
expansion of the network without changing the scope of 
subsystem responsibilities. 

[0002] Modern communication networks can be 
composed of millions of functional elements, which can 
be hardware such as switches or multiplexers, geo- 
graphically dispersed across thousands of miles of 
service territory. Managing such a network means pro- 
viding for redundant call routing and responding to local 
emergencies. It is well known for a communications net- 
work to tightly monitor the individual phones, switch ele- 
ments, relays, base station, and the like. Monitoring the 
communication network elements yields information 
concerning the health, maintenance, current activity, 
performance, and security of these elements. Such 
information is collected at the local levels in the network, 
processed, and analyzed at higher levels of manage- 
ment. 

[0003] Additionally, the monitoring and diagnostic 
functions of communication network elements can be 
organized along specialized areas of focus, or network 
management tasks. For optimum performance, the 
information should efficiently summarize activity occur- 
ring at local levels in the network for use by administra- 
tors who manage the communications network from a 
regional or national perspective. It can be difficult to 
coordinate all the areas of narrowed focus into a com- 
prehensive picture of network problems at the higher 
levels. The administrator has the difficult task of analyz- 
ing problems occurring to network elements (NE)s 
through whatever filtering or processing functions the 
network imposes between the administrator and the 
NEs. 

[0004] The - International ..Telecommunications 
Union-Telecommunications Standardization Sector 
(ITU-T) Telecommunications Management Network 
(TMN) suggests a five-layer management structure. 
The lowest level is the Network Element Layer (NEL), 
including switches and transmission distribution equip- 
ment. Above the NEL is the Element Management layer 
(EML) which manages the lower level elements, dealing 
with the issues such as capacity and congestion. The 
Network Management Level (NML) is concerned with 
managing the communication network systems associ- 
ated with the NEL and EML The Service Management 
Layer (SML) manages the services that are offered to 
the customers of the network, while the Business Man- 
agement Layer (BML) on top manages the business 
and set goals with respect to the customer and govern- 



ment agencies. 

[0005] Networks are typically composed of NEs 
from a large variety of different vendors. Therefore, 
there are a variety of Element Management Systems 

5 (EMS) to support communications with the NE types. 
The Network Management System (NMS) must inter- 
face with divergent EMS level equipment and protocols. 
It is the NMS systems that are responsible for control- 
ling the communications network and keeping it func- 

w tioning on a day-to-day basis. Network management 
can be briefly described as the task of command, con- 
trol and monitoring of the network. 
[0006] The ITU-T also divides management into 
five Operations Support Systems (OSS) areas of inter- 

is est. They are: Fault Management; Configurations Man- 
agement; Account Management; Performance 
Management; and Security Management, which are 
collectively referred to as FCAPS. As is well understood 
in the art, Fault Management is concerned with detect- 

20 ing network equipment problems, responding to 
detected problems, fixing the problems, and putting the 
network back into working order. Fault monitoring is 
usually done by receiving events from lower levels in the 
network indicating a fault and processing these events. 

25 This task can be very complex for large networks due to 
the relationships between the network elements, such 
as remote telephones, and the very high rate of events 
that must be handled. Software systems must be 
designed and built to handle these large data streams 

30 and provide effective fault management features. 

[0007] Configuration Management is concerned 
with databases, backup systems, and provisioning and 
enablement of new network resources. That is, Config- 
uration Management is the task of configuring the net- 

35 work to provide services between the various network 
elements. Configuring the network involves sending 
messages to the network elements, which set parame- 
ter values which permit signal paths to be established 
between elements, and controlling the behavior of these 

40 elements. The nature of modem networks makes this a 
complex task best handled by software. 
[0008] Account Management bills the network cus- 
• tomer for services rendered. Account Management is 
the task of collecting the record of services used by net- 

45 work elements. Usage information generates billing 
data that makes up the revenue stream for the service 
provider. 

[0009] Performance Management is concerned 
with collecting and analyzing data that indicates how 

so well the system is working. Performance Management 
involves collecting information from the network ele- 
ments, which act as a measure of network performance. 
This "quality" measurement is critical for service provid- 
ers as it defines how well they are providing service to 

55 their customers. This task is typically achieved by 
directly polling network elements, or otherwise receiving 
events from elements which convey such data. 
[0010] Security Management controls and enables 




NE functions. Security Management is the task of man- 
aging security, including authentication and encryption, 
in the services provided to the end customer. Portions 
of each FCAPS function are performed at every layer of 
the TMN architecture. s 
[0011] The Fault Management System is one of the 
most critical systems in the network to control. Intelli- 
gent NEs, able to perform self -diagnosis, may.provide a 
precise error message to the NMS. However, many NEs 
merely send an alarm when a problem occurs. These 10 
problems include switch failures, loss of power, line fail- 
ure, and loss of RF coverage (for wireless systems). 
The NMS system collects the alarm data for analysis. 
For example, an analysis could be performed to deter- 
mine a common failure mode among NEs in close phys- is 
ical proximity. The NMS could then issue a repair 
directive in response to the analysis. Intruder detection 
and interlock switch detection are examples of some 
security management issues that could be reported to 
the NMS by NEs. 20 
[0012] Modern networks are both large and com- 
plex, and require the use of software for their manage- 
ment. A NMS describes the conglomeration of 
hardware and software functions required to manage 
and control large voice and data communication net- 25 
works. NMS systems are also used for the control and 
provisioning of heterogeneous networks. The design of 
the NMS software typically follows the functional areas 
outlined above. Today's NMS are typically distributed 
systems using multiple software processes running on 30 
multiple workstations to handle the various areas of 
management 

[0013] Fig. 1 shows the block diagram of a typical 
NMS (prior art). As the figure indicates, the NMS com- 
ponents typically send messages to each other to 35 
accomplish the management task. They also receive 
events from the network over an event channel. This 
channel itself is a software entity like any of the other 
functional pieces. 

[001 4] The NMS is a very critical piece of the entire 40 
communications. It is the main tool for the service pro- 
vider to ensure that the network is performing optimally, 
and that the customers are happy with the service they 
receive. The system must also permit rapid configura- 
tion of the network when new customers are added. Ail 45 
these tasks must be performed at the highest levels of 
performance and quality, even as the network grows in 
size. Service providers spend large amounts of money 
to come up with solutions that meet their needs. How- 
ever, the task of designing and building highly scalable so 
NMS is a very challenging one. 
[001 5] Designing and building a good, highly scala- 
ble, NMS is not an exact science. There are two main 
reasons for this. First the traffic patterns of very large 
and complex network cannot be easily modeled. Sec- ss 
ond, the traffic patterns of large and complex network 
cannot be accurately simulated in a lab. Therefore, NMS 
designers must provide solutions for problems that are 
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not welt defined or easily modeled. Gross assumptions 
must be made on how the network will scale in size, and 
what effect this scale has on the network management 
tasks. A design strategy must be adopted based on 
these assumptions. When these systems are deployed 
in the field, many of the assumptions turn out to be erro- 
neous, resulting in poor performance of the NMS. 
[0016] As a result of a poorly performing NMS, the 
service provider is hurt in two ways. First, the customer 
experiences the dissatisfaction of interfacing with a 
poorly performing system. Potentially, customers can be 
lost if service is inadequate. Second, the service pro- 
vider receives a poor return on their substantial invest- 
ment in the NMS. 

[0017] Apart from building the NMS on flawed 
assumptions, NMS designers can make design choices 
which exacerbate the problem. In some network 
designs, the cost of hardware can be cheaper than soft- 
ware, when the development and maintenance costs of 
the software are factored in. Regardless of the design 
philosophy, network expenditures are rarely viable if the 
underlying characterizations of the problems are inac- 
curate. 

[0018] When analyzing the NMS design to meet the 
issue of scalability, the key issue is how well the network 
will perform as the number of system elements 
increase. Designers must make decisions on which 
component pieces of the system will be the least scala- 
ble. These potentially unscalable pieces are typically 
replicated, and multiple copies of that process are pre- 
pared. 

[001 9] Fig. 2 illustrates an example of system func- 
tion that is replicated to address the issue of scalability 
(prior art). For example, if the Fault Management (FM) 
process is considered to be the least scalable piece of 
the system, a decision may be made which divides the 
network to manage across some logical boundary and 
run muttiple instances of the FM, with each FM being 
assigned to a different division of the network. However, 
all the other processes needed to interact with a FM 
must now be designed to be aware of the fact that there 
are muttiple copies of the FM. A complicated policy of 
routing requests to different FM modules in the network 
is required. Further, framework must be put in place to 
inform these processes when additional instances of 
FM are started to handle network load. This makes the 
overall design of the system more complex. This com- 
plexity also makes the testing of the design more diffi- 
cult and error prone. 

[0020] In the above example, an assumption was 
made to make the FM the unit of replication, in response 
to the increased system size. If the assumption is 
wrong, then the original problem of scalability remains 
unaddressed, causing a very poor return on investment 
for NMS system expenditures. 
[0021] in the example presented above, the FM 
may potentially be multi-threaded to increase its per- 
formance. As is well known, multi-threading permits an 
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operating system to simultaneously execute different 
parts (threads) of a program. Software multi-threading 
is another common technique employed to increase 
load handling capacity. However, it is difficult to runs 
threads simultaneously without interference, and multi- 
threading is not always practical if incorrect assump- 
tions are made in the analysis phase. 
[0022] Multi-threading is a powerful technique but 
comes at a large cost Designing and developing multi- 
threaded software is acknowledged by the industry and 
academia to be a very complex task. The resulting soft- 
ware is very hard to test completely. Further, the 
number of software developers that have the skill set to 
write mufti-threaded software is very limited. Such 
designers are typically senior, at the high end of the pay 
scale. In many cases, multi-threading is not a safe 
option, as when the software has been developed by a 
third party. 

[0023] It would be advantageous if a method could 
be developed of scaling a communications network to a 
larger size without having to redesign or otherwise mod- 
ify the network management functions. 
[0024] It would be advantageous if an NMS could 
be grown to a larger size using the same functional sub- 
systems that were developed for the original NMS. 
[0025] It would be advantageous if network man- 
agement functions could be updated or tested in small 
manageable sections, so that the entire NMS did not 
have to be shut down or modified. 

Sumroery pf foe Invention 

[0026] Accordingly, a scalable, minimally complex 
NMS, with low development and maintenance costs is 
provided. The NMS of the present invention is designed 
to manage a plurality of network elements communicat- 
ing through network controlled (inks. The elements are 
organized into management groups of no more than n 
elements. A first NMS subsystem controls and man- 
ages the communication functions associated with a 
first management group. A plurality of other NMS sub- 
systems exist in the system, one subsystem for every 
management group of elements. The plurality of other 
NMS subsystems are substantially the same as the first 
functional NMS subsystem. Therefore, it can be said 
that if a total of m management groups exist, at least (m 
• 1) subsystems are replicated from the first NMS sub- 
system. 

[0027] The first NMS subsystem includes fault man- 
agement, configuration management, account manage- 
ment, performance monitoring, and security 
management functions, as described above. These 
subsystem functions communicate with each other to 
resolve problems and otherwise control the first man- 
agement group. The other replicated subsystem func- 
tions also limit communications to Interactions within 
their own subsystem. Therefore, as the network 
expands, and the subsystems are replicated, there is 



never a concern with communications between subsys- 
tems. Alternately stated, the number of communication 
partners within each subsystem is limited to only one of 
each type of function. 
5 [0028] A method of scaling communication network 
management function in a communications NMS is also 
provided. The method comprising: 

grouping network elements into m management 
w groups of no more than n elements; 

configuring a first NMS subsystems to manage a 
first management group; 

limiting management of the first management group 
to intra-subsystem communications; and 
75 replicating (m -1) additional NMS subsystems, one 
for every management group. 

[0029] The present invention method permits the 
NMS to be easily expanded. Then, the method further 
20 comprises: 

adding NEs to the network; 
grouping NEs into p management groups of no 
more than n elements, where p is greater than m; 
25 and 

replicating (p - m) additional NMS subsystems. 

[0030] The present invention also permits the sys- 
tem to be easily updated, repaired, or tested. Then, the 
30 method comprises: 

creating a second NMS subsystem to upgrade the 
management of network functions; 
discontinuing the management of the first manage- 
as ment group by the first NMS subsystem; and 

managing the first group of NEs with the second 
NMS subsystem. 

Brief Description of the Drawings 

40 

[0031] 

Fig. 1 shows the block diagram of a typical NMS 
(prior art). 

45 Fig. 2 illustrates an example of system function that 
is replicated to address the issue of scalability (prior 
art). 

Fig. 3 illustrates the present inventive concept of 

system level replication to solve the problem of 
so scalability. 

Fig. 4 illustrates the management update feature of 

the present invention NMS. 

Fig. 5 is a flowchart illustrating a method for scaling 

a network management function. 
55 Fig. 6 illustrates the updating function of the method 

described in Fig. 5. 



7 EP 1 063 815 A2 



8 



Detailed Description of the Preferred Embodiment 

[0032] The present invention is a unique design 
methodology for building highly scalable Network Man- 
agement Systems (NMS) to manage large voice and 
data networks. A NMS is considered scalable if it is able 
to maintain its level of service as the network being 
managed grows in size. The level of service is typically 
measured with metrics like responsiveness, correct- 
ness, speed, etc. Modem networks have very large 
number of elements approaching hundreds of thou- 
sands, and even millions. The NMS must meet the 
requirement of managing the large network as it grows, 
while providing system operators with a view of an inte- 
grated network. The building of scalable NMS software 
can be mapped to developing a distributed software 
system comprising multiple software processes working 
in conjunction to exchange messages. 
[0033] Fig. 3 illustrates the present inventive con- 
cept of system level replication to solve the problem of 
scalability. A communications Network Management 
System (NMS) 10 comprises a plurality of network ele- 
ments (NE)s 12. These elements can be remote tele- 
phones, landline telephones, fixed wireless stations, 
base stations, mobile switching centers, or the like. A 
first management group 14 includes no more than n 
network elements 12. A first NMS subsystem 16 (RS 1) 
includes a first plurality of management modules corre- 
sponding to a first plurality of management functions. 
Specifically shown are security module 18, configura- 
tion module 20, account module 22, fault management 
module 24, and performance module 26. The present 
invention is not limited to any particular number of man- 
agement functions, and other functions and manage- 
ment modules are possible in other aspects of the 
invention. Further, management functions may also be 
combined. 

[0034] Each management module 1 8-26 has a port 
connected to the other modules through NMS inter 
process communication (IPC) services module 28. 
Communications between modules 18-26 is limited to 
intra-subsystem communications, or communications 
inside the first NMS subsystem 16. 
[0035] At least a second" management group 30 
(RS 2) is shown with no more than n network elements 
12. At least a second NMS subsystem 32 is replicated 
from the first NMS subsystem 1 6. Second NMS subsys- 
tem 32 also contains a first plurality of management 
modules, where each management module is con- 
nected to other management modules for communica- 
tions limited to the second NMS subsystem 32. 
[0036] The first NMS subsystem 16 (or second 
NMS subsystem 32) is the portion of the system 10 
under replication. The pieces within first NMS subsys- 
tem 16 are designed to manage a network of size n, 
where n is chosen to be a pessimistic estimate of small 
fraction of the final size of the network. Within this sys- 
tem 10, each function or process inside the first NMS 



subsystem 16 is guaranteed to communicate with only 
one instance of every other type of process or function. 
That is, it communicates with other functions within first 
NMS subsystem 16, but not with functions in the second 

5 NMS subsystem 32, even if the second NMS subsystem 
32 was replicated from the first NMS subsystem 16. The 
management modules 18-26 need not directly commu- 
nicate with functions in second NMS subsystem 32. 
Limiting the network to intra-subsystem management 

w communications greatly simplifies the system design as 
the system is expanded. 

[0037] Each NMS subsystem 1 6 and 32 is designed 
to handle n network elements 12. Thus, when the sys- 
tem includes fewer than n elements, a single NMS sub- 

75 system 16 is all that is required. When more than n 
elements are deployed, additional workstations are 
installed and another copy of first NMS subsystem 16 is 
deployed, second NMS subsystem 32 for example. Sec- 
ond NMS subsystem 32 operates independently of the 

20 initially installed first NMS subsystem 1 6. 

[0038] The network operators are given an inte- 
grated view of the network through the set of top man- 
agement structure graphical user interfaces (GUI)s 34 
that interact with the different NMS subsystems 16 and 

25 32 using data dependent routing 36 and 38, respec- 
tively. Each subsystem 1 6 and 32 is allotted a section of 
the network to manage. This mapping is maintained in a 
database 39. The GUIs 34 are operative^ connected to 
database 39 to locate network elements 12. This infor- 

30 mation is used to route a user's request to the appropri- 
ate subsystem. Further, system 10 instructions to 
network elements 12 are routed through GUIs 34. 
[0039] Events from the network management 
groups 14 and 30 make their way to the GUIs 34 

as through the appropriate event channels 40 and 42, 
respectively. GUIs 34 that display or react to network 
wide events are connected to all the event channels 40 
and 42. The event channels 40 and 42 are easily scala- 
ble since they are treated just as the other pieces of the 

40 system. This is very important since many designers 
use third party software for the event channel. The prop- 
erties of these third party channels are not well known 
and not under the designer's control. 
[0040] The first NMS subsystem 1 6 includes a fault 

45 management module 40. The fault management mod- 
ule 24 monitors communications with network elements 
1 2 in the first management group 14 for faults. The fault 
management module 24 communicates with other man- 
agement modules 18, 20, 22, and 26 in the first NMS 

so subsystem 16 to locate the monitored faults, and take 
corrective action to fix located faults. 
[0041] Typically, network elements 12 have param- 
eters that must be set before network elements 12 can 
communicate with each other. The first NMS subsystem 

55 16 includes a configuration management module 20. 
The configuration management module 20 communi- 
cates with other management modules 18 and 20-26 of 
the first NMS subsystem 16 to set network element 12 
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parameters and to facilitate communication between 
selected network elements 12. 
[0042] The first NMS subsystem 16 includes an 
account module 22, The account module 22 communi- 
cates with other management modules 1 8-20 and 24-26 s 
of the first NMS subsystem 1 6 to generate billing data in 
response to the provision of services to network ele- 
ments 12 in the first management group 14. 
[0043] The first NMS subsystem 16 includes a per- 
formance monitoring module 26. The performance io 
monitoring module 26 communicates with other man- 
agement modules 18-24 of the first NMS subsystem 16 
to provide a measurement of the quality of service being 
enjoyed by the network elements 12 in the first manage- 
ment group 14. 75 
[0044] The first NMS subsystem 16 includes a 
security management module 18. The security module 
18 communicates with other management modules 20- 
26 of the first NMS subsystem 16 to provide authoriza- 
tion and encryption of services to the network elements 20 
12 in the first management group 14. Typically, the five 
above-named modules 18-26 are separate software 
applications. 

[0045] Rg. 4 illustrates the management update 
feature of the present invention NMS. While the design 25 
methodology presented above is designed for use for 
building a scalable system 10, it provides a neat solution 
for another hard problem in network management, that 
of network upgrading. When a network is providing 
service, it is very common for the network elements to 30 
be upgraded (both software and hardware) for a variety 
of reasons. Moreover, new network element types may 
also be added as more and more services are devel- 
oped. When the network is upgraded, the NMS 10 must 
also be upgraded to handle the changes. 35 
[0046] The network upgrade is carried out in an 
incremental manner where portions of the network are 
upgraded based on some schedule. The result of this 
incremental upgrade is that the NMS is required to man- 
age different versions of the network element. New ver- 40 
sions of the functional subsystem are designed to 
handle both old and new versions of the network ele- 
ments. The old elements are directed to the old subsys- 
tem for management, while the upgraded network 
elements are directed to the new subsystem version. 45 
[0047] System 1 0 comprises a third NMS subsys- 
tem 50 including a first plurality of updated management 
modules. The management of the first management 
group 1 4 of network elements 12 is discontinued by first 
functional NMS subsystem 16 (not shown, see Rg. 3). so 
The third NMS subsystem 50 is now engaged to man- 
age the first management group 1 4 of network elements 
12. The system can be further updated by replicating 
the third NMS subsystem 50, and using the replication 
to replace second NMS subsystem 32. ss 
[0048] Fig. 5 is a flowchart illustrating a method for 
scaling a network management function. Although the 
process is depicted as having numbered steps for clar- 



ity, the numbering should not be inferred to imply order 
in the process unless explicitly stated. Step 100 pro- 
vides an expanding network of communicating network 
elements. Step 102 groups network elements into a first 
and second management group of no more than n net- 
work elements in each group. Step 104 configures a 
first NMS subsystem to manage a first plurality of func- 
tions for the first management group. Step 106 man- 
ages the first NMS. subsystem through interactions 
between the first plurality of first NMS subsystem func- 
tions as described above in the explanation of Rg. 3. 
Step 108 replicates the first NMS subsystem, creating a 
second NMS subsystem to manage the first plurality of 
functions for the second management group. Step 110 
manages the second NMS subsystem through interac- 
tions between the first plurality of second NMS subsys- 
tem functions. Step 112 is a product, where 
management is provided for an expanding network. 
[0049] In some aspects of the invention, step 106 
includes managing the first NMS subsystem by limiting 
interactions to the first plurality of first NMS subsystem 
functions. Step 110 includes managing the second 
NMS subsystem by limiting the interactions to the first 
plurality of second NMS subsystem functions. 
[0050] In some aspects of the invention, further 
steps follow step 110. Step 114 adds network elements 
to the system. Step 116 groups network elements into p 
additional management groups of no more than n ele- 
ments. Step 1 18 replicates p additional subsystems to 
manage the p additional management groups. 
[0051] Typically, step 106 includes sub-steps (not 
shown). Step 106a includes managing the first NMS 
subsystem by interacting a fault management function 
with the other subsystem functions to monitor the net- 
work for faults, locate the monitored faults, and take cor- 
rective action to fix located faults. In some aspects of 
the invention, step 102 includes the elements having 
parameter values, and step 106b includes managing 
the first NMS subsystem by interacting a configuration 
management function with other subsystem functions to 
set network element parameters and to facilitate com- 
munication between elements. Step 106c includes 
managing the first NMS subsystem by interacting an 
accounting function with other subsystem functions to 
generate billing data in response to the provision of 
services to network elements. Step 106d includes man- 
aging the first NMS subsystem by interacting a perform- 
ance monitoring function with other subsystem 
functions to measure the quality of service provided to 
network elements. Step 106e includes managing the 
first NMS subsystem by interacting a security function 
with other subsystem functions to provide authorization 
and encryption of services to network elements. 
[0052] Rg. 6 illustrates the updating function of the 
method described by Rg. 5. Step 100 provides for the 
updating of the network management functions. Then, 
further steps follow step 112. Step 120 creates a third 
NMS subsystem to manage a first plurality of upgraded 
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functions for the first management group. Step 122 dis- 
continues the management of the first management 
group by the first NMS subsystem, and step 124 man- 
ages the first management group of network elements 
with the third NMS subsystem. 5 
[0053] To complete the updating process, step 126 
replicates a fourth NMS subsystem from the third NMS 
subsystem. Then, step 128 discontinues the manage- 
ment of the second management group by the second 
NMS subsystem. Step 130 manages the second man- 10 
agement group of network elements with the fourth 
NMS subsystem. Step 132 cteratively repeats steps 128 
and 130, replacing NMS subsystems replicated from 
the first NMS subsystem with NMS subsystems repli- 
cated from the third NMS subsystem until all p manage- rs 
ment groups are managed by an NMS subsystem 
replicated from the third NMS subsystem. 
[0054] in some aspects of the invention, further 
steps are included. Step 134 (not shown) creates a 
database cross-referencing each network element to a 20 
managing group, and step 1 36 (not shown) locates a 
network element and routes system instructions to the 
element. In some aspects of the invention, step 136 
includes creating a graphical user interface (GUI) to 
manipulate the location of network elements in the data- 25 
base. 

[0055] This invention of system level replication 
allows the building and deployment of scalable NMS for 
managing a network of size N*M at the development 
and maintenance cost of a NMS that is designed to 30 
manage a network of size N. Further, this design allows 
a simple scheme for building incremental network 
upgrade strategies. Other variations and embodiments 
of the above-described invention will occur to those 
skilled in the art & 
[0056] Where technical features mentioned in any 
claim are followed by reference signs, those reference 
signs have been included for the sole purpose of 
increasing the intelligibility of the claims and accord- 
ingly, such reference signs do not have any limiting 40 
effect on the scope of each element identified by way of 
example by such reference signs. 

Claims 

45 

1. A communications Network Management System 
(NMS) comprising: 

a plurality of network elements (NE)s; 

a first management group of no more than n so 

said NEs: 

a first NMS subsystem for managing said first 
management group, said first NMS subsystem 
including a first plurality of management mod- 
ules corresponding to a first plurality of man- 55 
agement functions, each said management 
module having a port connected to said other 
management modules for communications lim- 



ited to said first NMS subsystem; 
at least a second management group of no 
more than n said NEs; and 
at least a second NMS subsystem for manag- 
ing said second management group, replicated 
from said first NMS subsystem, with a first plu- 
rality of management modules, where each 
said management module is connected to said 
other management module for communica- 
tions limited to said second NMS subsystem. 

2. The system of claim 1 in which said first NMS sub- 
system includes a fault management module, said 
fault management module monitoring communica- 
tions with network elements in said first manage- 
ment group of elements for faults, said fault 
management module communicating with said 
other management modules in said first NMS sub- 
system to locate the monitored faults, and taking 
corrective action to fix located faults. 

3. The system of claim 1 in which said plurality of NEs 
include parameter values to be set for interfacing 
between selected NEs; 

in which said first NMS subsystem includes a 
configuration management module, said con- 
figuration management module communicating 
with said other management modules in said 
first NMS subsystem to set said NE parame- 
ters, facilitating communications between 
selected NEs. 

4. The system of daim 1 in which said first NMS sub- 
system includes an account module, said account 
module communicating with said other manage- 
ment modules in said first NMS subsystem to gen- 
erate billing data in response to the provision of 
services to said NEs in said first management 
group. 

5. The system of claim 1 in which said first NMS sub- 
system includes a performance monitoring module, 

. said performance monitoring module communicat- 
ing with said other management modules in said 
first NMS subsystem to provide a measurement of 
the quality of service being enjoyed by said NEs of 
said first management group. 

6. The system of claim 1 in which said first NMS sub- 
system includes a security module, said security 
module communicating with said other manage- 
ment modules in said first NMS subsystem to pro- 
vide authorization and encryption of services to 
said NEs of said first management group. 

7. The system of claim 1 further comprising; 
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a third NMS subsystem including a first plurality 
of updated management modules; 
in which the management of said first manage- 
ment group of said elements is discontinued by 
said first NMS subsystem; and s 
in which said third NMS subsystem is engaged 
to manage said first management group of said 
NEs. 

8. The system of claim 1 further comprising: io 

at least one graphical user interface (GUI) for 
monitoring said NE events in said first and sec- 
ond NMS subsystems; 

a mapping database, operatrvety connected to is 
said GUI, to locate said NEs; and 
in which system instructions to said NEs are 
routed through said GUI. 

9. In a network of communicating network elements 20 
(NE)s, a method of scaling a network management 
function as the network expands, the method com- 
prising: 

grouping NEs into first and second manage- 25 
ment groups of no more than n NEs in each 
group; 

configuring a first Network Management Sys- 
tem (NMS) subsystem to manage a first plural- 
ity of functions for the first management group; 30 
managing the first NMS subsystem through 
interactions between the first plurality of first 
NMS subsystem functions; 
replicating the first NMS subsystem, creating a 
second NMS subsystem to manage the first 35 
plurality of functions for the second manage- 
ment group; and 

managing the second NMS subsystem through 
interactions between the first plurality of sec- 
ond NMS subsystem functions. 40 

10. The method of claim 9 in which managing the first 
* NMS subsystem includes limiting. interactions to the 

first plurality of first NMS subsystem functions; and 

45 

in which managing the second NMS subsys- 
tem includes limiting interactions to the first plu- 
rality of second NMS subsystem functions. 

1 1 . The method of claim 1 0 further comprising: so 

adding NEs to the system; 
grouping NEs into p additional management 
. groups of no more than n NEs; and 
replicating p additional NMS subsystems from 55 
the first NMS subsystem to manage the p addi- 
tional management groups. 



12. The method of claim 10 in which managing the first 
NMS subsystem includes interacting a fautt man- 
agement function with the other subsystem func- 
tions to monitor the network for faults, locate the 
monitored faults, and take corrective action to fix 
located faults. 

13. The method of claim 10 in which the grouping of 
NEs includes the NEs having parameter values 
which are set to interface selected NEs, and in 
which managing the first NMS subsystem includes 
interacting a configuration management function 
with other subsystem functions to set NE parame- 
ters, facilitating communication between selected 
NEs. 

14. The method of claim 1 0 in which managing the first 
NMS subsystem includes interacting an accounting 
function with other subsystem functions to generate 
billing data in response to the provision of services 
to NEs. 

15. The method of claim 10 in which managing the first 
NMS subsystem includes interacting a perform- 
ance monitoring function with other subsystem 
functions to measure the quality of service provided 
to NEs. 

16. The method of claim 10 in which managing the first 
NMS subsystem includes interacting a security 
function with other subsystem functions to provide 
authorization and encryption of services to NEs. 

17. The method of claim 11 wherein the subsystem 
management functions are updated, and further 
comprising: 

creating a third NMS subsystem to manage a 
first plurality of upgraded functions for the first 
management group; 

discontinuing the management of the first man- 
agement group by the first NMS subsystem; 
and 

managing the first management group of net- 
work elements with the third NMS subsystem. 

1 8. The method of claim 1 7 further comprising: 

replicating the third NMS subsystem to create a 
fourth NMS subsystem; and 
discontinuing the management of the second 
management group by the second NMS sub- 
system; and 

managing the second management group of 
NEs with the fourth NMS subsystem. 

19. The method of claim 18 further comprising: 
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iteratively repeating the replacement of NMS 
subsystems replicated from the first NMS sub- 
system with subsystems replicated from the 
third NMS subsystem until all p management 
groups are managed by an NMS subsystem 5 
replicated from the third NMS subsystem. 

20. The method of claim 10 further comprising: 

creating a database cross-referencing each NE 10 
to a managing group; and 
locating a NE and routing system instructions 
to the NE. 

21 . The method of claim 20 in which the location of NEs is 
includes creating a graphical user interface (GUI) to 
manipulate the location of NEs in the database. 
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(54) Design for scalable network management systems 



(57) A system and methodology for building highly 
scalable Network Management Systems (NMS)s for the 
management of large voice and data networks is pro- 
vided. The NMS of the present invention is designed to 
manage a plurality of network elements communicating 
through network controlled links. The elements are com- 
bined into management groups of no more than n ele- 
ments. A first functional subsystem controls and man- 
ages the communication functions associated with a 
first group of elements. Each function or process inside 
the subsystem is guaranteed to communicate with only 
one instance of every other type of process or function. 
As additional network elements are added to the sys- 
tem, additional management groups are also added, 
one functional subsystem for every management group 
of elements. The plurality of other functional subsys- 
tems are substantially the same as the first functional 
subsystem. Limiting the management process to intra- 
subgroup communications greatly simplifies expansion 
of the system. When a total of m management groups 
exist, at least (m - 1) functional subsystems are replicat- 
ed from the first NMS subsystem. Thus, the network is 
scaled up by using size-limited subsystems. 
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